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. (54) Associating text derived from audio with an image 



(57) System with methods and apparatus for ren- 
dering text converted from audio with an image. The im- 
age is captured using a photo-sensitive film camera or 
digital camera, or created using computer graphics soft- 
ware. Audio is captured either at the time of image cap- 
ture or at another time. The captured image and audio 



are stored and associated with each other using a mul- 
timedia file format. The audio is converted to text using 
voice recognition software. A composite image is 
formed from the image and the converted text by posi- 
tioning the converted text on or near the image. The 
composite image is output on a computer monitor, print- 
er, or other output device. 
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Description^ 



[0Q01] ,The jnyeritbn relates, tc^ as^jxsi^^^^ text with 
arj im^ge, where |he text is deriy^bj rpm ai|diQ.ia,ssoci- 
ated with ttiejrn^ge..il . ^ . . . . 

/ [0002] , Cameras captjuce jrri'a^je's ^ on 
film or a digital mediurn.. Between, the time an image is 
^ capturedjand, the .time it Is i^rintecJ or, otherwise dis- 
J played, the ph)otdg raphe r may forget or IpSjB access to 
information related tp^the image, such as the. time at 
which it .vyas captured or the location of its subject, mat- 
ter. "^^^''""[ ' ' " "' " ' •■ ■ 

^ [0003] Sonrie fi^ cameras and,digital. cameras allow 
text, such as text representing th^ date on which an im- 
! . age was capiured, or digital Inf9j7inatipn to be associated 
vyitb a pjhotog typically creatpd by the 

camera and sjLiperimposeci on the image, at a predeter- 
mined, iocation in^ aj)redetergiined format! I 
, ^ [0004] , A digital camera capjtures an image and ^tores 
it in digitaf formal on a eompuierf readable disk, ^flash 
, mempry, or other persisterit stpfage medium. Each im- 
age may be stored, in a separate file according to a 
^ . standard forrhat. The file may then iDe transferred to the 
, rnerripry of ^a computer, vyhere, it may |hen be operated 
. . . on by^computpf^spftyy^.re,^ „ , . 

. J0QQ5] , A'H^jp ^Dd pthe^^^ be associat- 

^ .ed witl;! an image.filel^Jhe,ko^^ image spec- 

. Jf icatibn, ,f or example, specif ies a standard file format for 
storjng images captured, with a .digital camera. An ex- 
1 tension to the FlashPix spepification allow? one or more 
audio^ streams, to be associated with, . and therefore 
stored with, an ifnage file. SoftvyartB.worjsing' in cpmpli- 
. ance witfn the extensior).may pla^f back.one.or n^iore of 
!, the audio stf earns; associated .y^^ *(P^9® fiie while 
, displaying Ihie^ima^^^^ ! • _ ; 

[0006] S/dice recogrtitioh software converts audio sig- 
nals, representing human, speech into. text. VojceTecog- 
; njtion^softvy^re^^ 
. recognize ai limited number of wpcds dr^be rinibre general 
^ ^. ahd.create.text by classifying speech phonetically. Voice 
. /recognition software can private computer-readable text 
... frprp digitally represented audio, jhe text thius created 
,cari,,then b,e..interprele,d and manipulated by cpcnputer 
. ^ software and stored pn computer-readable media. 
. [0007] It is . possible to associate audio.or text yvith a 
, photograph at the time the photograph is Raptured by 
carrying a tape recorder or a notepad with th e. camera ' 
to record J inforrnation associated with the, picture being 
tak^n. Sonrie digital .cameras allow, direct recording of 
audio .with a picture while it is being captured. The audio 
iT^^y th^P be play e)d^ back when the picture. is viewed. 

. . . Summary of the Inve ntion ^: 

, ^..i'I9P!9i?l ' J*!*? pne aspect, thejnyention applies a. com pu- 
. . jt§t jon^Lspeech-to-text conversion p^rocess to audio data 
in a computer* readable memory to produce converted 
.,j.jtext. A comppsite jmage is created by compositing im- 
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age data stored in the^, compute r-reada^^^^^ nri.emory and 

^ the, converted cornpc)site ifpage -is then printed 

,, .bn p printer or dth.er suitable The image 

data rpay be represented .b.y an irnage taken by a cam- 
,er9 and the audio" data may represent speech recorded 

' at about the same fime,th.e^^rnage was taken. The cam- 
era may be a digital camera comprfeing a microphone 
and be operable to record speech and to associate re- 
corded speech with images tak.en by the camera. 
[0009] The audio data' and image data 'may be com- 

. ponents of a singly source file, vyhich may be a file In 
FlashPix format, the audJp'data and image data may 

. originate in separate spurce files. The audio.and image 
.data; .may liriked by^ia tag stored- wiU^ .the audio data 
or with the image data or with, bpJth the audio data and 
iiTiage data.^Jhe cqriveiled text stdred on a 

mass storage device as an alias to th^ .audio, data. The 
converjl.e.d.text may be cornposited so as to cover a por- 
.^tiprxpf the ^rpage represented by. jbe image data, or so 
as not to cover any portion of the image represented by 

thp , iniage,data- , , ^ _ . , . , ][ :. , , 

[bOi O] . Th.e image data rnay. represent a sequence of 
single images, arid the audio data may represent a se- 
quence of audio segments. Ope,audip segrpent of the 
sequence of audio segments nr^ay be maiched with one 

J. single-image of the sequence, pt single images, before 
converting the one audio segment into a converted text 
segment and creating a.sing[le .cornppsi.te im^ge by com- 

> positing Jhe one jingle,. image, and the converted text 
segn;ient. Alternatively, for, each single image of the se- 
quence of sirjgla.images. a composite, image may be 
created by compositing th^.sirpgle image and the con- 
verted text. , ,.^;}0; ,/ . . f^.- . i ■ 

[0011] . , One advantage of the present inventipn is that 

^ it associate^ text derived fropi. audio data with. iniage da- 
, t^,j Because, .text may.iypically. be , stored, transmitted, 
,.and rnanipulateid efficiently than audio 

data, qpnveijting audio-data to text and associating the 
., converted text, with an irpagacan be advantageous over 
J ,stpr ji^ g audio with an image, i . , , 
,[0bl2J , A further adyantege of the present invention is 
-.that- it allows, the association of. audio data with image 
.data to persist when th© image.data is printed.^or when 
spund. reproduction .is not possible practical, by con- 
, verting tlp<^ ^udio data to converted text and printing the 
converted te)rt..vyith,the irnage^data. In this way, the in- 
yentipn prpvides printed descriptive inforrnation. as con- 
. tainedj in lhe audjo data, relating Ip image data without 
.requiring additipnal e^ort .by the user. . . 
/i[Opi 3] , A^urther.advantage of the inyenti9n is that con- 
verted, text can^ b^ stored in .a standard cpmputer text 
fornnat, and.ttiU;S nnay be-manipulated^ndio by 
standard, computer text processing softvyaria., 
[00,14] , I Anpther advantage of the invention is that the 
j^rnage^data and associated audio data may be captured 
..contemRoraneously^ or at different; times! They may be 
papturejd using Si single deyicersuch as a digital camera 
with a microphone and audio capture capabilities,, or by 
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different devices. Eitfi'er th'e'irtSage data pr'fKe aliSio da- 
ta, or botK^m«iy Be humaW-oHginat^ cre- 
ated. The inveritton' thu&'0rd^tf^ 
mahipOlatihg. Wh(d sVbrhig irhagfe^^^ arid Xexi. 

*• ' [0015] ' Other feattiresf' ahd advantages of' Inven- 
tion will tje 'apparent frbhi the folldvi/ihg desSc'rlpVicJi and 
from the claiiiis:^'^' ■ ^'^'^ ' 

Brief DescribtVon'bTthe Drawing^ '^^'"^^ ' 

[001 &i Tte: T Ts a bibtk diagram^l a system' ri^sfde in 
accordance' with ih'elnv^ritiori; " ' " \ " 
[0017]' ' fTG; 2 is a 'flow diagi'krn'"6f the sequence of 
events initiated by a user choice to' priKt an irtia^e file 
' using the text associatbr process. _ « • • 
[0018]' FIG.' 3 iliustrates converted text assbciiated 
with an imlge'; ' * ' " ' - ' ' ' " 
[0019]-^^ FIG.*"4 \s^a flow b"iagrarn of a method for con- 
verting* audib data- in a ffereigh *fang subtitles in 
another langu^^fe. " - f V' ^ ' 
[0020] FIG. 5 illustrates a computer anB cbrriputer el- 
ements suitable for iiiriplerhiehting the Invention. ' 

^b^taildd bes^rip^ion "- j. ■ 'ii-i-.? 

[0021 ] Ret erring fo F 1 a computer i DOa ^dTisplays 
' ' ' 'output brr a monitor 110 corihected^t 'I/O p6rf 26da, ob- 
tains ihpiSf 'fro'm'S' keyboard 1 20 conn&ted at 1^ port 
200b, and outputs hardbofby'dn ^' printer 1'90 cbhnected 
at I/O poil 260d. Th^ cdrhputer 100a is also cbhnected 
to a hard disk 180 forst6ririg *Snd retrieving files and 
othei data -^t f/O 'port'&606, to LAN 219 for corrimuni- 
cating with other computers lOOb-d^ and to the' Internet 
220 thrdugh'tW^lAr^shb'ftnr'Ston^ ^rchivih^; ^nd re- 
trieving infdrmation. A'digita? camera TTO'ib' capable of 
capturing a still image, f Vie digftal carheta 1 70 als&con- 
- tains a microfihohd'foir capturing aii^ 
with the litiage^Thedigrtafd^^ bbfiverts'the im- 

age and audio tiata ihfb-tfrgtef toWiv ItorS^ thfem' in a 
multimedia file format with an Mudib^'compoh'^^^^ 
image component, such as Ihe FlashP^ix fbrrrlat- Tl^ file 
is transferred to the'cbirip&terTdOi'b^ port '200e. 
[0022] An image pfbc^ssShg appYicatloh 1 40; such as 
the Adobe Acrobat 'pit)gram"(^vaH^le from ASSb^ Sys- 
tems I ncorpbratefl of ' San Josfe ; Calif orh la )V runs on th e 
computer lOOa. AfeoVuhhing oiS'tHre is 
a voice retognition 'appiicatlon ^ fSOr'sOdh 'as 'bVagon 
Systems' Dragon Dicl'aie, capabl^'b'f cohvertfrtg audio 
data representing spd^h ^ii*itb c'tffiVerled text aVitf 'stor- 
ing the cbnvertbd 'te)ft' conl^puteHVeadabte filer Also 
running on the cbrri53uter lOOa fs ai^^teift i^Sfisofciatbr appli- 
cation Teb for^assoclatiiigco'nvfeHed'i^kt^^ image. 
The text asscKJlatorVpplicatibn 1 60 donfirhunlcates with 
the Image processing applicatlbrt- tlirough kn^ hmage 
processing API i 45, which Iric ludes procedures for inri- 
^ porting images, audio, ahd text into a file of th'e'image 
* prdcessrhg application 1 40.' The 'tesct associatbr appiica- 
tion 160 communicates'with the' voice reddgnition appti- 
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cation 150 through £ voice recognition API. which in- 
cludes procedures for converting audio data to cpnvert- 

^ 1 text and ToT^say ihg converted text in a text file. 

[i6o2'3]' Thd tbxt* associator application 160 presents 

5 the user with a nunriber of seleciadle opiions. One such 
' ' option Is an bptibn Vb" print a source file."such as a Flash- 
^yf iy^ fiFe. The sburce 'fiie minimally cphtains an image 
Compbnent. but may also cbntairi'br be associated with 
in au9io cbmpbnent. Referring to FIG. 2, when a user 

io selects the option to . print £ ^source file (step 300), the 
* tekt' associator application 160 determines whether an 
audio component is associated vyith the image^compo- 
h'eht cbhtaineti 1h the source file 'vStep 310). it ho audio 

' ^ ' cbrnponeht is ass^bciated with the image component, the 

1^''^ text associatbr process 160 opens the source file in the 
image prcxd'lsirt^^applic^^^^ 14b byxaHirVg the image 
' processing APj 1^5 opbh'file functibli.' (step 320), The 
text associatb? prdces'^ 160'then printi the image com- 
ponent by calling the image process^ing API 145 print 
'function (step 370| ^^ ' - ' "^^ ' 
[0024] If an aiidib'compbhent is contain ed'rn or asso- 
ciated with thef ^ourte file, the user is presented with an 
option to convert th'e'audk> component to cbnverted text 

' ' ' (ktep 330). If the user 'd^'dlmes the option, then the im- 

25 age componbnT^?^^ci|ibrSed an^ (stef5s; 320'and 

'^d) .''fif tWCjser'iic^^ 

ciatdr £ipplicatibn*'*'i66'1& tffa audio cbmpbnent 
(step 340). The'text associator application' 1 60 then con- 

' ^ verts the audio' cbmpidherit'tb converted text -by 'calling 

3d the voice recbg'nitibn'applicatioh API 1 SS'speech-to-text 
; ' fuhctibh (step 35^^)!'Th^^^ ri3Coghltjp.n application 

"''■-^''^iSb'stSbreV t'he'cbny^^^ and passes the 

*f»6haiTie^ tp''*the^ ifeyr^a'ssbciafbV'af:?^^^ i 6(), which 

inSpbri'^'the' oohvelrtefd-'text'l^^^^ tfise bpen source fjie by 

3S calling the image processir^'^ API* i45^'irTjpon textiunc- 

tiiSrt'^^^s^^; - 1;^^;'^ :' 

r:- jtioisj'' Neihythbt'eSttas^oc 

\' ' • impbiietf cort inYelatibii tb ihe image 'corhpo- 

rierit b^"ca(ltir\g th^ inSagb 'p/bcbssirig AP l 45* text posi- 
^o'^ tibfling' fuhctiohsV' creating a'cbflhi)bsite irn^gfe' (step 
360).''F^isHioVfih^ of converted tfext may Include a default 
"'"^text f>ositibh. which Ynay be thfe Ibwbr-center section of 
' the ifinage cbrttponerit. However,' the'cbnverted text may 
bb' ^placed any Where withiri the* area of the irha'ge com- 
45 • pohent, at any orienfation. It may aiiso'be placed using 
'any text ^labemeht features, such as rtght-left-cehter 
' * justificatTon, or flow along a curve.* The converted text 
- * * rhay aisb be placed anywhere exterhail to the area of the 
^ image corripbneWt, e.g., underneath the image as a cap- 
56 tion) or irt a s'epkVate localibn from.the image, e.g., as 
"Microsoft PowerPoint j^ot^s pages, the converted text 
can be formatted using the text formats available in the 
image processing application 1 40, which'ifeay4nclude a 
default font and a default point size. The converted text 
SS' nriay be st6?ed iritKe image jDrbces^n'^ ^plication ;text 
■ ' ' fbrmat, in'Vecibror bitriHap fbrmat. o/^a 

-component ifite.'' " ' . — • * *'-i'»n r • . z.. 
~ ' * [0026] 'After the imported cdnveiied text has bedn po- 
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sitioned in relation to the image component, the text as- 
sqciator application 1 60 prints the j mage cbrtipbnerit, in- 
clucJing'thie p^^ imported co^ text^'By call- 

' ' ing the image Fjrocessing appn^ f45print func- 

tion (step 370). An example of converged text ^sso^iated 

with an image is^sho\Am inyiG 3.^ \ ' 

^ . . [0027] AjlKough the method of Fl^. as described 
^ , ^above, prpnipt the user for, input at steps 330 and 360. 

it may al%)^operate autpm^^ yyithputVuser input. 
.* . When operating automatically; step^^^ 335 are 

removed, and ainy audio cpmponerit that is c^ 
or asspciatec! with a source fije'is ^'utorhjatic^ily located 
and converted to cohverteMCl text at steps 340 and 350. 
Aft^Qr^the text has been innported at step 355, the con- 
verted text is positioned aiutbmatically at step 360 using 
default position settings. The image cpiripbne^^^ 
/'jng the positioned imported convert is then print- 

ejJaf step 370'!^^ prpcesscan^^ 
icajly on rhulti'ple source fjles by. using wildcards. in the 
source file specification at step 300. orby using a script 
or batch file. ' 
.. [0028] AltemativelyMhe text ~ associator ap^^^ 

160 nriay be implennented as a plug-in to the image 
. prpcess,ing applicat^ 140, In this form, the text ^asso- 
^ . ciatpr application J 6!pa^ ifTiagp. 
_ ^^.roce4.s \ng apgjicatip^^^^ .wh ich^ may^' se jecteci by 
"the user"at any t inne that an i^al^e^ is display ed pri the 
monrtbr i 10! The user may add converted text previous- 
ly converted from audio to the displayed image by se- 
lecting a. source textjile. 'to whjch the method of step 
. 360 is th^n applied. Ijf ap audio. component is currently 
" ^ associated with^th^ disfDiayed image, then the Use;; may 
^[ choose jQ cprtvert't^ audip. cpmpOT into converted 
. text "arid .impidrt. the Tcoi^^^ the image 

, ,^ prpcessii^ng appljcatipp \is\pg the methods of steps 350 
' \ ^nd'355. the„user may also choose to associate audio 
. vyith the ,displayecJ iipage. by selecting a^ §^parate audio 
fire-or by'.^sjng,a^ to. prpvjde audio data to 

, thp inriagp pro^^ 

. the audio into .converted text using the methods of steps 

[0029] Jmage. may be saved wiih its .asspciated au- 
,. dio . ^id'or^conv^rt^ed text components byj a^nunriber of 
. rnean?. The cpnyerted tejd^ CQmppn^^nt^m^ attached 
. c!ir;ectly to tho.^inriape image prbcpssing applica- 

tipn's 140 standard manner f he popverted text compo- 
nent may be . stored with in Jhe sarne .fi'le as the audio 
. component fi|e, within the image component file, or in a 
separate text filQ. The file may be sayed by rnaking the 
^ converted J^^xt an alias to an.audio component of the 
... appliqatiop prograrp, vsitiich^in turn wdyld be^an addition- 
; al cornj^pnept pf Jhe.^sayed'file. .If the "audio C(pniponent 
^, . ^ i^ ,9hapged, , u^p"pf ap aiias would aliQW.!auto^^ 
. .,^,pf Jbe ppw methpd of 

, ^ jj jSiep,35a (i,an irp^g^js Io»be,^ a. single .file con- 

-..Jajnirig ir^^^ a tag ot jdentffier^rpay be cre- 

..^.at'ed^assTC^ te)d Qonripofiw^^ a 

,..sp^c:jfic,one^^^^ ima'ges: . , . 
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[0030] An audio cpmponent may be associated with 
an, image component by anumber of means. The audio 
connp.onent may be saved in the same fi e as the image 
conhponept according to the irriage processing applica- 
tion's 140 standard pro^^ in whicKcase.a tag on 
the ;i^ucJlp component and bri the |mage component Is 
^uised to. indicate an pssqclation be^^ com- 
■ ppnents. Supya taggin used, for example, 
by the Flash Pix,fprnriat. " ' 
[0031], Atternatjy^ely,, the audio cpmponent may be 
.stored in a. separate f^ file in Which t)ie Fmage 
cppiponent. is stored.' In this. pa^^ a coiripdhent the 
fiienanie of the separate aijdio f^^ 
nent of the file p^me of the image cprriponerit file. A dig- 
ital |tag in the audio conriponent file may nnatch a digital 
tag in the inhagd compipnent file, irthere is more than 
dp e^ audio ppirippnerit tp^^^^^^ ah image 
component ^ filej, then /the ^^s^^^^ is shared 
among audb comppn^h(files. inrt^ more than one 
irriage con;ipGn(Bnt^shared^^^^^^ audio com- 
ponents, then the same audip7tag, is shared among the 
respective image comppneht. filesr The^ user nnay man- 
ually associate the^ audip^cpmponem vyith the image 
.corpppnent (e g.,^audip'tape to anafog fi!nri,^^udio cap- 
tured independently^^ image)..^|A'part o^^^^ com- 
/ fDpnent may be a tag for a part of the image cdmppnent 
(e.g.. the wp/dj 'three' may b^^^ tp match the third 
image of nriultiple irnages in the irnage component. 
[0032] Conyersipn pf audip data to poh verted text, 
and subsequent association of the converted text with 
.^nriage data, may ^be , accomplished by, a humbe of 
.rTie§ns. If the audio data is/stpred in a digital audio for- 
'mat that-js not recognized by tbe ap- 
plipatipn ilSO/jfie audio, idgt.ta may be played thrpugh a 
. speal^cer, or converted tp arialpg ^udio signals and output 
on an output line using, appropriatepjpyer software, and 
then imported intp,the ypice^^ application 150 
in an apprppriate digital audio fprrnat' through a micro- 
/phdpe pr.^n iQput . .\. ' . 
[0033] The voice recdgnitidn application 150,. which 
may be .pragop ^ys^tems' Qragpn D^ictate, may be used 
jto ponvprt the* dig] to ppnyerled t^xt using a 
stjandard^Al^i.^T^^ image' processing , application 140 
then uses standard te>a inriportatiop^^ import 
,thp converted te)^ to position the converted text in 
relatipn.tp ap.irnage a^ 360. The image, 
Jhcl^ding thefpp^it^ converted text^ .rpay . then be 
printed accpi^ding . to s^^^ a monitor, 
pr saved in a filp.as ^escribe^ 

[0034] . Jf an. irnage js other tjnan a digital image, such 
as a film negative or pript^^ the piethod p| sjtep,356 can 
be used to convert, the audip jdata to, converted text. 
Thpn standard. ifnpirjntjng j^^^ can -be used to 

physically print the converted text ontatihe image. 
[Pp35J . Audip.Jext, andjrnagp cornponents. may take 
a number ptforrp§ app be created by a niirnberof; means 

Jnciud/ng, bgt^nQt/limited ,tp, \f\&. following-^. V 

[003€fi An Jmagp canj|)e created by capturing the im- 
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age with a digital camera or btHer imaging deyi^ 
as a film camera or a VHS reco/der Jmages rnay^be cre- 
ated by digitizing photographs,' scanhjrig objects, or 
converting vector images to rasjfeiized form. " 
[0037] Digital bit-rhapped ahd/pXel-mappe'd^^i^^ 
formats which rnay be used include, but are hotjimited 
toi Graphics Interchange Format 1giF), Joint /Photo- 
graphic Experts Group toimar (JPEC^,, Tagged Image 
File Format (TIFF). ^Microsoft Windows bitmappe<^- 
graphips fprmaf (BMP), AdobelphotcDShop format/and 
FlashPjx'format. Vector images' n^iay bfeVsed in- 
clude, but are not linhiied to! PpstScri^t files^, Adobe Il- 
lustrator files, and cony erted bitmapp images, Arialog 
images which may be used include/ but are noi jirpited 
to,' photbigraphic fiirri images (single or multiple 'frame, 
negative or positive), and rnption^'yideo images such as 
VHiS. Images representing a d^ page c)r docu- 

ment page component w^^^ be used jncitide an 

Adobe Porfable 'pocym^^^ Forriffeit (PDF) page or* sub- 
page, ani image in a word prbfees^ih^ docurnbnl! pr a 
spreadsheet cell or'^ells. " *^ '\' 
[0038] An jrnage rfiay^ pohtain* multiple franjes, in 
which case the user rriay* be presented, w to 
' include the iass'odiafed converted {e)ft' on just the first 
frame or on a plurality of the multiple frames. The user 
'may also choose to diStribute the asisociated converted 
text across one or nnfore of the multiple' frarhe^;^ by, for 
exarrijDle, ^ssociatirfg k distinct porton of ihe^ccinyerted 
text with each of the multiple franriesj^^^^ imiige com- 
ponent. 

[0039] A'udib data ipay be prealted using* a digital or 
arialog audid recorder'that is indepeiident of the device 
used to create or capture the image dgJa.JFbr'ex^ 
image data may be ca'ptured by a digitaj "cSrt^'§ra'17Q 
and ihe audio 'data cajiturfed oh a^diQrt^raudio tape 
(DAT) by S micrdph6rie''^nd lD'Af recorde^^^ The "image 
da't^ ^nb audid'data mky tfien eacH fefe s^ im- 
ported " intb the cbmputWr 100^' Usihjg'stand&td"m^ 
The audio data may, but need not bfe, r'eOorde^^ 
speech/'" " . . .'^iV^.-V.' '.tT' /' 

[0040] The' ^udio dat^' rHay^ S 3\gi(a\y>Fkn^\6^ re- 
cording. The audio datk' rtfay'b^ compres^sed or hon- 
cornpressed; X*digita1 recordfn^ ffiay be stored In'a'com- 
p uter-^s u pport ed f orriiat/s uch'a's Re^ AUd ib, Q uic k^ime , 
* or FlashPix. An analogi Vecofdirig hht^y bd stbred bn con- 
ven\lonal tapfe: fiirfi'stnbs.'orojhferrh^^^^^ aihd converted 
to audio text by the voice r^fcognjtftJri 'ajii3iicati6W l^ by, 
for example, playing the recofdirTg thioUgh a speaker 
and capturing the a u aid dat^ HSith a micrb^tibne at- 
tached loan I/O port 2d0a-f bf the.'cbmputer 100a. or 
coriverted to '4nal6g audio sign^ls'arid output on ah put- 
put lin^ using appropriate player ^bfrtJaVfe and thiR im- 
ported into th^Voice^ecognitioh kpplicailon H 50 through 
an input line." * ' ' / * 

[0041] ' Mrnage data arid asspciated^udibdata'rri^ 
but heed hot be, ctMted or daptured at the' sarrib time. 
For example, audio data from a library of pl^e^irecorded 
sanriples may be associated witHHm^ge 'datd' either at 



the time of image caplure or. at a later \ime. In some sit- 
_ uatjqns rt m.i^ht be. u4efyf to capture irnage data and 
^ record-associated aliciio data later, such as wherl view- 
jng jh e jmag e data on, a cornpule r jnon itor. 
5 [d04'2j ' Although.the^pave^^ nriay use converted text 
derived jrqm spe^chrit.may also derry^ converted text 
from any con^pqhelit of an audio reqordfng. For exam- 
; .^ple. at s^ep 350 a.,recbrding of a huniah' singing may be 
]| * converted Ip text^' or aj^lngje voic'^ may be" selected trom 
10 an audb if qcgrding containing m voices and*then 
[ * ' cqbyert'ed to cphyerf ed text. * . , , . * 

':[0()43i V" Iri em'aii*e>na(.iy'^ embo^ subtitles are su- 
perimpbseij' on a Ipriptioh picture,^ where the audfci data 
'is^in the natfye /anguage of the film', arid where the con- 
is verted^"te)a '^ rn yhbm^^ FIG. 4, 

"* audio data'from a tilm is captured'(^step^^^^^^ The audio 
data is converted fb cqhverted'te)^^ using voice rqcogni- 
tiori software 1 50'(slep 41 0):' The cdhvr^rted trans- 
lated to the desi'red'subtitle 'teriguage, using software 
20 such as the Nuance Speech. Recognition Engine devel- 
oped by Nuance Communications and SRI International 
' of Men lo Par l<;Californja (step ;420);The translatied con- 
verted text is superimposed bh the series of images with 
which the audio data'is associated''(ste^ .430). Sirnilarfy, 
2^ * 'Vh'conjuncffbri wiiR au data ffbm^k v/ideb^displayed on 
a video display, e. g. , a television, the audio data is con- 
■ verted to conyerted 1^ the 
video to support <He hearing impaired. 
[00^4]J Referring to FIG. 5, the invention may be im- 
30 plerhented in digital' electronic circuitry or in computer 
hardyvai:e,^'yirm^^^ software, o^^ in combinations of 
theiTi^ "Appa^^ Implemented 
[ In a cpiT^^^ ptpgraiT* p^^^ in a 

. 1 rnWchine-r^ad^^^^ device Br 'ex6^^^ a 

^5 C(5hiputer"pf^ the irtveiHtion 

."""may be perfbr^ 
^' "''^a jprbQrarif tp^p^^^ the' invehjiph' by op- 

' bratiiig^ Oij/ ihpu^ -dati and generating SqtpSt.^Su itable 
■ ^ prpcWskp^^^^ 666 iiicpude, bi'l v^^ gen- 
41^ *^ irai and speclai pii^pbse mibfo'processp Generally, a 
prpcessor will receive instructions andd^fa from a 'read- 
c^iry'membiv (R^^^ 5id ^nd/br' a^andbrifTi acclsis rnem- 
' ory (RAM) 505 thrdugh a CP\j bu 
gen'eraily' alsb iriceivd^'prbg rams 'and data f fbm a stor- 
45 age mfediCiWsuch'as an internal disK 545 operating 
thrbugH a'mass i^tdra'ge intertace 540 ^6r a removable 
disk 535 bperatihg frtrbugh an I/O interface 530. The 
flow of data over an I/O bus 525 tb and from I/O devices 
535 and 545, the processor 500, and mbrnory 505. 510 
SO ' is cbntr6lie5 by^h I/O' controller 515."^^^ is ob- 

""^•t^i'ned thYbogh a* keylioal^drrTK>use\'.styiusl'^^^^ 
* ^ tr^ckb^irtbbch-sehsitive S^^^ device. 

'These 'eiernehts Will be fburid in a cbrivehtional desktop 
* or Workstation computer as well as bWer complitiers'suit- 
55 ^ able fbr executing cbhnput^r pro^rarhs'ihripleri^^^ the 
' niethbds de^ribed'here. which niay pe 'Used in conjunc- 
tion wfthah^ digital pnntehgi^ 

display monitor, or other raster dutpQt^device ciapable of 
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prodMC^ing pplor or^g^fay^sc^^^^^ pnjgaper, film, dis- 

play scrQpr^ or olh^er'putpm iT)ed|um!^^^ J 
[0045] By way of example, a prjhtmg.deylce 550 im- 
pj^mejjiting an interpreter 19^3. page description lan- 
guage, such, as the. Postscript. includes a 
rriicroprocessor 570 fdr executing program instructions 
(includingjo^jt iqs|ruptionp).jStQred pn^a printer random 
access nfidrribry (RAM) 580 and^ a printer read<>nly 
^. memory (ROM). 5^0 and cpntrplling a printer marking 
' engine 600. The RAM 51^0 is. pptj^^^ 
by a ma^s ^Qrag^.^deyice^^^ W^ disk (noi 

shown). " ' ' . - - - 

[0046] S|ipr.age^d(Bvices,.suitable .]^or tangibly embody- 
, Ing. computer prc^ram in^tructj^ppsjnclud^ all forms of 
nonvolatile me^pnory, incl.uLding by w^y of. example sem- 

.^,jconductpr memory devices,, suqh E PROM, EEP- 
ROiyi, and; flash memoiy dey^^^ niagrietic disks such 
as iptern^l hard, disks 545 and r^rnpyable disks 535; 
magnetp-optical disks; and CD-RQM disks. Any of the 
foregolhg may be, supplemented by. or incorporated in. 
^ specially-designed ASIps (applicati.dn-specific integrat- 
ed circuits). _j ^ J [ 
[0047] ., -^!though..^lempnts of the invention are de- 
scribed in terms, pf, ^ spftwja re irpplementation, the in- 
vention may be implemente^gl jrj.sptjj^^fl^,^ 
firmware, or a combination of the three. ^ 
.[0048] .The present jnyei;itipn has. been described in 

. V terms of. an embodiment. The invention, however, is not 
limit eel, Jo ,Uie ernl^pdiment, dep.ictpd and described. 
Rather, , the, scof^e.5 of the invention .is detin^d by the 
claims. 



.. . .A copnputgrriqaplenjenl^^ an 

, .-. prpyrdjrig audip d^tg.andj^ 
, J- puter-j:eadable memoryj^ j , , ^ . 
... .applying a corriputetonal.^p^ec^ con- 

-br.' 110 : y^^®l9'^FR^'^^f^ to produce 

::>r:v , c;pny?rted text; ...^'^ ] 

. . • Pr®^t*n9 a cornpos jte image by:^onripositing the 
. , . ^ : , image data and the converted text; and 
displaying the composite irngge.^., 

.2. The method of claim 1, wherein: 

•c ^f. c-iq i. displaying, comprise^ printing., the composite; 

t-^ , Ji-c^.f^9^' A : .-..CO-' 

.;:U-r.".:VH:,.. or.; :■;•;}. be -c- " : r.'A; 

the audio data and the image, data are compp-: 
nents of a single source file, preferably a Flash- 

^n...,Pix.fM^.-o,j^ : . . . .,.,r ..V 

j r S •'.TJ>;!t^?:i? to-'- • - -!:v*■^•;^-••• 



, 4. The method of claim 2, wherein: 

the audio data and the image data originate in 
separate source files. 



75 



'. 5. . The method of claim 2. wherein: ' 

the computer-readable memory is a random 

; > ■ . .-a^^?^AW®''^fy.9^ ?,t?o"^P^?®'^ operable to ex- 
^?.~^-y V ^cute Cpmpufe instructions; 
i' . , . . thej conriputatrana^^^ corivefsion 
process cbrhpris^s computer program instruc- 
, ^, . tion^ .ex^.cy.ting.pn ithe ..cpmputerr 

the step of providing audio data and Image data 
1 , '0 a computer-read^ble/nemory comprises ob- 
. ., . taining jnjprrnatidp linking .the ajLidip data and 
, , the image da)aj^ finding the image 

data and the jau^^ r^s stor- 

age device: and reading the image daV:^ and the 
^ ... audio, dal^ (rpm thie m^^ 

the random access meniory; and 
the step of creating a composite inr^age com- 
, ^ prises importing the .image data and the con- 
_ ve rted text into an Jniage processing applica- 
tjonjprpgrarnj. image processing 

. appltcatipn program to fon^^ the converted 
text, to place th^ text with^respect to the image, 
and ',to cpmposije the "placed, formatted text 
with.the. image data tp produce the composite 
image. , , ^ , ! 



20 



01 A. 

25 



0 - 
30 



3 J 
35 



.6., . The niethod of,claim 5, ^yvherein the image data and 
^ the audio data are. in sieparate. .files stored on the 
..^^rriass.stiDrage device ^nd.the linking the 

audip data and tl;ie. .image dab to each other is a 
y ..^tag . Stored, jn^ separate files. 



40 



45 



SO,, 



55 



7. The method of claim 5, wherejp the cpnverted text 
is composited so ^s.tp coyer a por^ipn of the image 
represented i)y the irriage data. 

8, . Tjhe^finethod ofxIajrn.S, Wherein* the converted text 
is composited so as not to cover any portion of the 
image represented by the image data! 

9. The method of claim .5,-wherein the.image data rep- 
resents a isingleJfTiage-,, \ . 

• v \'..- a v. \.r-:-> . /u " i;: r':^•.. : 

10. The rnethod p|. claim 5, vyherein the image data rep- 
^. .rpsepts a^j^equepc^ and the audio 

data repr^S)?.nt$ ^ ^^guence of audio segments, the 
method further comprising: 

matching one audio segment of th^ sequence 
of audio segments with one single image of the 

■:Of ' r^'^.^MQPp® 9t single images;,. - 

qonvertipg the one audio segment into a con- 

, ■ , . j ye rted text, ^egrpent ; and ....... 



V 6 

BNSOOCID: <EP_0905679A2_I_> 



11 



EPO 



A2 



12 



creating a single comjDOSfteTmage liy cornpos- 
iting the one single Image and the converted 
text segment. 

11. The method of claim 2. wherein the image data rep- 
resents a sequence ol sanigle ihtages. the method 
further comjjrising:^^ ^ ^ 

for each single imagb ofYhe¥eque^ of single 
Inriages, creating a "composite image by com- 
ftoshing tfie^singjle image and the converted 
'text; ^nd ' ' 

printing'each of 'the cdm^site irriages. . ^ 

12. The method of ciaini 11 . wherein the'camera is a ^5 
" di'gitarcarhbra com prisirig a microphone and be op- 
erable to record s'f)6Vch and to associate recorded 
speech With imag^s^ tafkien by thS' camera. 

13'.*The meihod 6f d Tfor priniihg^an image with' 20 
text, further cbmpriSrhg: 

* reading irViag^ data and audio data into a ran- 
' • ; ■'''dohi^-;^cc%b^9rie^mo a cpi?vf>6ter operable to 
^6xQc ut c 'boiiipiRU r p rog'rarh irist'fUctions; 
'Applying a corfiputatidn speedh-td-text conver- 
sion proc'esV. comprising cbnriputer program in- 
' ' structbn^ Wxecut(ng on the cornputer. to the au- 
(dio datairi/ the' rahdorn access 'rriemory to pro- 
duce converted text; o . ' 
importing the image data and the converted text 
i hio an ima ge p rbce^sin ^ ap|D>ijcat ion p rogVam , 
executing the ^irfiage prbcfe'^^^^^ 
progranri to forniat the' doh verted text, to place 
■ ■ the t6)rt nearnhe bbttoHi'cen^ S^the image, ' 
" ' * and to borfiposfte the cdht^red, form^'tted text 
with the image data to produce a composite im- 
age; and • - - 
'^'pnhting the'corhpbsite inrra ' ' ' 

14. A computer progranri product tangibly stored on a 
coriipliterW^adabfe 'medium, dprViprfeing^ instruc- 
tionstb^^- ' ' 

read audio data and image data into a compii- 
' • tisr-readabVei mremdry; - -fi-- ^ • 
convert the audio data to* Ve>rff ' " ^ 
composite the image data and the converted 
text td c re'ate^ a com pbs it'e irna ge; and 
display the~-cbrf»)X)^'ite"^irf^^^^ by" 
- printing the composite irnaj^e. ' • ' ^ ' ' 

1 5. A system for displaying an image with text, compris- 

means for Feading'Slidio d^ta and image data 
* ihto a'computer-reac^ble'^^nerndry; 
means for con\fertihg''the dud'^ idata to text; 



25' 



' ' 'medhs for corirtpdsittng Ih^^^ d^ta and 

converteci*'V&*xt 'td* create* a' coiiipdsite irifia 



30 
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the 
image; 

and^ 

\" meanis ^fbr d/splaying the" composite imige. 
" ' ' ' ' preterabty by printing the compbsitb image. 

'16. ' the i^ystem bf blairn 15, further cbmjarising: 

* 'mearis tor reading Jma^e d^^ 

" ^rhlo a VaWdom acSess memory of a computer 
' " dj^erable to execute* corbputer program instruc- 
tions; ' " , 
mesins fbrapp^ spee'ch-to- 
text donversiori process, cbmprism 
\ ^ . ' . - pVograhni instructions exec^tUng on thb compu- 
• '7' ter, to the audio data in the' randonj' access 
merTK)rV tb pfbduce cohyerted text; and~ 
mean's fbr importing the ' imlage data and the 
converted" text into an Image processing appli- 
■ cation program, executing* the! irnaige process- 
ing application program to format the converted 
text, to place the text near the bottom center of 
the irhage, and to "composite thd centered, for- 
* matted text with th'e iiinage data to produce a 

17. Yh^' system of claim 15,' vyheVeiri 'the comp'uter- 
' "readable menriory is a random accejss rriemory of a 
'•^ computer operable to execute corhputer program 
' * instructiofis. the system furtH^r cdmprisirig: 

means for reading image data and audio data 
into a random access memory of a computer 
operable to execute computer prograrh ihslruc- 

tions; 

• * • ■ ' ■ • ' mfearis for applying' a comptitatbri speech-to- 
text convers idrt' {Drbdess ;* domprising computer 
program instructions executing on the compu- 
u : yj r .j ''-%r"'tl5'th4 -^^^ access 
memory to'prdbuce converted text; and 
me^lns'for bbl^Sibin'g'infdrm au- 
&b data and theMrriage data'to eWch other, find- 
ing the image data and Vh^'abdio data stored 
on a mass* storage device.'ahd're^ding the im- 
age data and the audio data trorh the mass stor- 
age dfevice into the random access memory; 
and 

means for importing the irrfeige data and the 
converted text into an image processing appli- 
cation program, executing th~e image process- 
ing application program to format the converted 
text, to place the text with respect to the image, 
and to' cbmposrte 'the piaced; ' formatted' text 
with the image data to produce the composite 



5S 



im^ge' 



ia 



The system of claim 15, wherein tli'e image data 
represents a sequence of single images and the au- 
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dio data represents a sequence of audio segments, 
the system further comprising: 

means for matching one audio segment of the 
sequence of audio segments with one single s 
image of the sequence of single images; 

means_for converting Jhe one audiQ segment 

j into a converted te)rt segmen^^ 

' means for creating a single 'composite image 

^ ' by compositing the one single image and the 10 
converted text segm&ht. 0' v 

i 19. The system of claim 15, wherejn the image data 
: represents a sequence of sin^fe Images, the meth- 
i od further comprising: r- - — *; • ^ , '^^ 

j .'C . . ' "J.^?' M ] 

\ \r ' - "^^^^ tQf creating a compbsiteTrn^^e for each 
■ I sihglefimage of the sequence of single images 
■ 5; • V iUy compositing the singfejmage^^ 
i / verted ^exl; and i v ! 

— means-for printing each otthe composite :imag- 
! es. - 
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